Assigning Grammatical Relations with a Back-off Model 
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Abstract 

This paper presents a corpus-based method 
to assign grammatical subject /object re- 
lations to ambiguous German constructs. 
It makes use of an unsupervised learning 
procedure to collect training and test data, 
and the back-off model to make assignment 



decisions. 



in the vocabulary, as well as membership informa- 
tion with respect to these classes for all entities de- 
noted by nouns in the vocabulary. One problem with 
this approach is that it is usually not available for a 
broad-coverage system. 

This paper proposes an approximation, similar to 
the empirical approaches to PP attachment decision 
( Hiudlc and Rooth, 1993 ; Ratnaparkhi, Reynar, and 



Roukos, 1994| ; |Collins and Brooks, 1995| ). These 



1 Introduction 

Assigning a parse structure to the German sentence 
(1) involves addressing the fact that it is syntacti- 
cally ambiguous: 

(1) Eine hohe Inflationsrate erwartet die Okonomin. 
a high inflation rate expects the economist 
'The economist expects a high inflation rate.' 

In this sentence it must be determined which nom- 
inal phrase is the subject of the verb. The verb er- 
warten ('to expect') takes, in one reading, a nom- 
inative NP as its subject and an accusative NP as 
its object. The nominal phrases preceding and fol- 
lowing the verb in (1) are both ambiguous with 
respect to case; they may be nominative or ac- 
cusative. Further, both NPs agree in number with 
the verb, and since in German any major con- 
stituent may be fronted in a verb-second clause, 
both NPs may be the subject /object of the verb. 
In this example, morpho-syntactical information is 
not sufficient to determine that the nominal phrase 
[jvp die Okonomin] ('the economist') is the subject of 
the verb, and [np Eine hohe Inflationsrate] ('a high 
inflation rate') its object. 

Determining the subject/object of an ambiguous 
construct such as (1) with a knowledge-based ap- 
proach requires (at least) a lexical representation 
specifying the classes of entities which may serve as 
arguments in the relation(s) denoted by each verb 



make use of unambiguous examples provided by a 
treebank or a learning procedure in order to train a 
model to decide the attachment of ambiguous con- 
structs. In the current setting, this approach in- 
volves learning the classes of nouns occurring unam- 
biguously as subject/object of a verb in sample text, 
and using the classes thus obtained to disambiguate 
ambiguous constructs. 

Unambiguous examples are provided by sentences 
in which morpho-syntactical information suffices to 
determine the subject and object of the verb. For in- 
stance in (2), the nominal phrase [np der Okonom] 
with a masculine head noun is unambiguously nom- 
inative, identifying it as the subject of the verb. In 
(3), both NPs are ambiguous with respect to case; 
however, the nominal phrase [np Die Okonomen] 
with a plural head noun is the only one to agree in 
number with the verb, identifying it as its subject. 

(2) Eine hohe Inflationsrate erwartet der Okonom. 
a high inflation rate expects the economist 
'The economist expects a high inflation rate.' 

(3) Die Okonomen erwarten eine hohe Inflationsrate. 
the economists expect a high inflation rate 
'The economists expect a high inflation rate.' 

This paper describes a procedure to determine the 
subject and object in ambiguous German constructs 
automatically. It is based on shallow parsing tech- 
niques employed to collect training and test data 
from (un) ambiguous examples in a text corpus, 



and the back-off model to determine which NP in 
a morpho-syntactically ambiguous construct is the 
subject/object of the verb, based on the evidence 
provided by the collected training data. 

2 Collecting Training and Test Data 

Shallow parsing techniques are used to collect train- 
ing and test data from a text corpus. The corpus 
is tokenized, morphologically analyzed, lemmatized, 
and parsed using a standard CFG parser with a 
hand-written grammar to identify clauses containing 
a finite verb taking a nominative NP as its subject 
and an accusative NP as its object. 

Constructs covered by the grammar include verb- 
second and verb-final clauses. Each clause is seg- 
mented into phrase-like constituents, including nom- 
inative (NC), prepositional (PC), and verbal (VC) 
constituents. Their definition is non-standard; for 
instance, all prepositional phrases, whether comple- 
ment or not, are left unattached. As an example, 
the shallow parse structure for the sentence in (4) is 
shown in (4') below. 

(4) Die Gesellschaft erwartet in diesem Jahr 
the society expects in this year 
in Siidostasien einen Umsatz 
in southeast Asia a turnover 

von 125 Millionen DM. 
from 125 million DM 

'The society expects this year in southeast Asia 
a turnover of 125 million DM.' 

(4') [s [NCs, a , { „om,aaa} Die Gesellschaft] 
[vc a , erwartet] 
[pc in diesem Jahr] 
[pc in Siidostasien] 
[nc 3 a „cc einen Umsatz] 
[pc von 125 Millionen DM] 

] 

Nominal and verbal constituents display person and 
number information; nominal constituents also dis- 
play case information. For instance in the structure 
above, 3 denotes third person, s denotes singular 
number, nom and acc denote nominative and ac- 
cusative case, respectively. The set {nom, acc} indi- 
cates that the first nominal constituent in the struc- 
ture is ambiguous with respect to case; it may be 
nominative or accusative. 

Test and training tuples are obtained from shallow 
structures containing a verbal constituent and two 
nominative/accusative nominal constituents. Note 
that no subcategorization information is used; it suf- 
fices for a verb to occur in a clause with two nom- 



inative/accusative NCs for it to be considered test- 
ing/training data. 

Training data consists of tuples (n\, v, 712, %), 
where v is a verb, n\ and 712 are nouns, and 
x G {1,0} indicates whether n\ is the subject 
of the verb. Test data consists of ambiguous tu- 
ples (711,^,712) for which it cannot be established 
which noun is the subject/object of the verb based 
on morpho-syntactical information alone. 

The set of training and test tuples for a given cor- 
pus is obtained as follows. For each shallow structure 
s in the corpus containing one verbal and two nomi- 
native/accusative nominal constituents, let n\, v, ni 
be such that v is the main verb in s, and n\ and n% 
are the heads of the nominative/accusative NCs in 
s such that n% precedes n% in s. In the rules below, 
i,j G {1,2}, j ^ i, and g{%) = 1 if i = 1, and 
otherwise. Note that the last element in a training 
tuple indicates whether the first NC in the structure 
is the subject of the verb (1 if so, otherwise). 

Case Nominative Rule. If m is masculine, and 

the NC headed by rij is unambiguously nominative^, 

then (ni,v,ri2, g(i)) is a training tuple, 

Case Accusative Rule. If rij is masculine, and the 

NC headed by is unambiguously accusative, then 

(ni, v, n.2, g(j)) is a training tuple, 

Agreement Rule. If m but not nj agrees with 

v in person and number, then (n%, v, ri2, g(i)) is a 

training tuple, 

Heuristic Rule. If the shallow structure consists 
of a verb-second clause with an adverbial in the first 
position, or of a verb-final clause introduced by a 
conjunction or a complementizer, then (jii,v, ri2,l) 
is a training tuple (see below for examples), 
Default Rule. (n\,v,n2) is a test triple. 

For instance, the training tuple ( Gesellschaft, er- 
warten, Umsatz, 1) ('society, expect, turnover') is 
obtained from the structure (4') above with the 
Case Accusative Rule, since the NC headed by 
the masculine noun Umsatz ('turnover') is unam- 
biguously accusative and hence the object of the 
verb. The training tuple (Inflationsrate, erwarten, 
Okonom, 0) ('inflation rate, expect, economist') and 
(Okonom, erwarten, Inflationsrate, 1) ('economist, 
expect, inflation rate') are obtained from sentences 
(2) and (3) with the Case Nominative and Agree- 
ment Rules, respectively, and the test tuple (Infla- 
tionsrate, erwarten, Okonomin) ('inflation rate, ex- 
pect, economist') from the ambiguous sentence in 
(1) by the Default Rule. 

1 Only NCs with a masculine head noun may be un- 
ambiguous with respect to nominative/accusative case 
in German. 



The Heuristic Rule is based on the observation 
that in the constructs stipulated by the rule, al- 
though the object may potentially precede the sub- 
ject of the verb, this does not (usually) occur in writ- 
ten text. (5) and (6) are sentences to which this rule 
applies. 

(5) In diesem Jahr erwartet die Okonomin 
in this year expects the economist 
eine hohe Inflationsrate. 

a high inflation rate 

'This year the economist expects a high 
inflation rate." 

(6) Weil die Okonomin eine hohe Inflationsrate 
because the economist a high inflation rate 
erwartet, . . . 

expects 

'Because the economist expects a high inflation 
rate, . . . ' 

Note that the Heuristic Rule does not apply to verb- 
final clauses introduced by a relative or interrogative 
item, such as in (7): 

(7) Die Rate, die die Okonomin erwartet, . . . 
the rate which the economist expects, . . . 

3 Testing 

The te sting algori thm makes use of the back-off 
model ( Katz, 1987 ) in order to determine the sub- 
ject/object in an ambiguous test tuple. The model, 
developed within the context of speech recognition, 
consists of a recursive procedure to estimate n-gram 
probabilities from sparse data. Its generality makes 
it applicable to other areas; the method has been 
used, for inst ance, to solve preposition al phrase at- 
tachment in ( Collins and Brooks, 1995 ). 



3.1 Katz's back-off model 

Let w™ denote the n-gram w±, . . . , w n , and /(w™) 
denote the number of times it occurred in a sample 
text. The back-off estimate computes the probabil- 
ity of a word given the n — 1 preceding words. It 
is defined recursively as follows. (In the formulae 
below, a(ui™ -1 ) is a normalizing factor and d r a dis- 
count coefficient. See (Katz, 1987) for a detailed 
account of the model.) 
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3.2 The Revised Model 



In the current context, instead of estimating the 
probability of a word given the n— 1 preceding words, 
we estimate the probability that the first noun ni in 
a test triple (nx,v,n 2 ) is the subject of the verb v, 
i.e., P(S = l\Ni = m, V = v,N 2 = n 2 ) where S is 
an indicator random variable (S — 1 if the first noun 
in the triple is the subject of the verb, otherwise). 

In the estimate Pf, (w„|w™ _1 ) only one relation — 
the precedence relation — is relevant to the problem; 
in the current setting, one would like to make use of 
two implicit relations in the training tuple — subject 
and object — in order to produce an estimate for 
P(l\ni,v,ri2). The model below is similar to that 



in (Collins and Brooks, 1995). 

Let C be the set of lemmata occurring in the 
training triples obtained from a sample text, and let 
c(ni,v,n 2 ,x) denote the frequency count obtained 
for the training tuple (n\, v, n 2 , x) (x £ {0, 1}). We 
define the count f so (rii, v, n 2 ) — c(rii,v,n 2 ,l) + 
c(n 2 , v, ni, 0) of n\ as the subject and n 2 as the ob- 
ject of v. Further, we define the count f s (ni,v) = 
J2n 2 £C fso(ni, v, n 2 ) of rix as the subject of v with 
any object, and analogously, the count f (ni,v) of 
Ui as the object of v with any subject. Further, 
we define the counts f s (v) = J2 nu n 2 ec c ( n i> w ' n 2> !) 
and f (v) = J2 ni ,n 2 eC c ( ni ' u ' n2 ' °)- Tne estimate 
Pi(l\ni,v,n 2 ) (0 < i < 3) is defined recursively as 
follows: 



P {l\n x ,v,n 2 ) = 1.0 

pm \ S Zt'v2) > tf*i(ni,«,na)>0 

Pi(l\ni, V, n 2 ) — < U(ni,v,n 3 ) 

I P(j_x)(l|ni,u, n 2 ), otherwise, 

where the counts Cj(ni,u, n 2 ), and ti(n\,v,n 2 ) are 
defined as follows: 



f so (ni,v,n 2 ), if i = 3 

Ci(n\,v,n 2 ) = { f s (m,v) + f (n 2 ,v), Hi = 2 
fs(v), if i = 1 

ti(m,v,n 2 ) = 

fso(n>i,v,n 2 ) + f so (n 2 ,v,nx), if i = 3 

f s {ni,v) + f (n 1 ,v) + f s (n2,v) + f (n 2 ,v), if i = 2 
fs(v) + f (v), if* = l 

The definition of P3(l|rii, v, n 2 ) is analogous to that 
of Pboiwnlw"^ 1 ). In the case where the counts are 



positive, the numerator in the latter is the number 3.4 Related Work 

of times the word w n followed the ro-gram w"^ 1 in 
training data, and in the former, the number of times 
n\ occurred as the subject with ni as the object of v. 
This count is divided, in the latter, by the number 
of times the n-gram u>™ -1 was seen in training data, 
and in the former, by the number of times n\ was 
seen as the subject or object of v with n% as its 
object /subject respectively. 

However, the definition of P2(l\n-y,v,ri2) is some- 
what different; it makes use of both the subject 
and object relations implicit in the tuple. In 
P2{l\n\, v, 712), one combines the evidence for m as 
the subject of v (with any object) with that of 712 as 
the object of v (with any subject). 

At the Pi level, only the counts obtained for the 
verb are used in the estimate; although for certain 
verbs some nouns may have definite preferences for 
appearing in the subject or object position, this in- 
formation was deemed on empirical grounds not to 
be appropriate for all verbs. 

When the verb v in a test tuple {n\,v,n2) 
does not occur in any training tuple, the default 
Po(\\ni,v,n2) — 1.0 is used; it reflects the fact that 
constructs in which the first noun is the subject of 
the verb are more common. 

3.3 Decision Algorithm 

The decision algorithm determines for a given test 
tuple (ni, v,n2), which noun is the subject of the 
verb v. In case one of the nouns in the tuple is 
a pronoun, it does not make sense to predict that 
it is subject /object of a verb based on how often it 
occurred unambiguously as such in a sample text. In 
this case, only the information provided by training 
data for the noun in the test tuple is used. Further, 
in case both heads in a test tuple are pronouns, the 
tuple is not considered. The algorithm is as follows. 
If ni and ri2 are both nouns, then n\ is the subject 
of v if P3(l|ni, v, n 2 ) > 0.5, else its object. 
In case ri2 (but not ni) is a pronoun, redefine Cj and 
ti as follows: 



In dCollins and Brooks, 1995| ) the back-off model 
is used to decide PP attachment given a tuple 
(y, rii,p, 712), where v is a verb, n% and n.2 are nouns, 
and p a preposition such that the PP headed by p 
may be attached either to the verb phrase headed 
by v or to the NP headed by rti, and U2 is the head 
of the NP governed by p. 



Ci(ni,v,n 2 ) 



fs(ni,v), if i = 2 
/.(f), if * = 1 



ti(m,v,n 2 ) 



f s (ni,v) + f a (n-L,v), Hi = 2 
fs(v) + f (v), if * = 1 



and calculate P2(l\rii,v,ri2) with these new defini- 
tions. If P2(l\ni,v,ri2) > 0.5, then n\ is the subject 
of the verb v, else its object. We proceed analogously 
in case n± (but not 712) is a pronoun. 



The model presented in section 3.2 is similar to 
t hat in (pollins and Brooks, 1995 ), however, unlike 
( pollins and Brooks, 1991 ) , who use examples from 
a treebank to train their model, the procedure de- 
scribed in this paper uses training data automati- 
cally obtained from sample text. Accordingly, the 
model must cope with the fact that training data is 
much more likely to contain errors. The next sec- 
tion evaluates the decision algorithm as well as the 
training data obtained by the learning procedure. 

4 Results 

The method described in the previous section was 
applied to a text corpus consisting of 5 months of the 
newspaper Frankfurter Allgemeine Zeitung with ap- 
proximately 15 million word-like tokens. The learn- 
ing procedure produced a total of 24,178 test tuples 
and 47,547 training triples. 

4.1 Learning procedure 

In order to evaluate the data used to train the model, 
1000 training tuples were examined. Of these tuples, 
127 were considered to be (partially) incorrect based 
on the judgments of a single judge given the original 
sentence. Errors in training and test data may stem 
from the morphology component, from the grammar 
specification, from the heuristic rule, or from actual 
errors in the text. 

4.1.1 Subcategorization Information 

The system works without subcategorization in- 
formation; it suffices for a verb to occur with a possi- 
bly nominative and a possibly accusative NC for it to 
be considered training/ test data. Lack of subcatego- 
rization leads to errors when verbs occurring with an 
(ambiguous) dative NC are mistaken for verbs which 
subcategorize for an accusative nominal phrase. For 
instance in (7) below, the verb gehdren ('to belong') 
takes, in one reading, a dative NP as its object and 
a nominative NP as its subject. Since the nomi- 
nal constituent [nc Bill] is ambiguous with respect 
to case and possibly accusative, the erroneous tu- 
ple (Wagen, gehdren, Bill, 1) ('car, belong, Bill') is 
produced for this sentence. 



(7) Der Wagen gehort Bill, 
the car belongs Bill 
'The car belongs to Bill.' 

Another source of errors is the fact that any ac- 
cusative NC is considered an object of the verb. 
For instance in sentence (8), the verb trainieren ('to 
train') occurs with two NCs. Since the NC preced- 
ing the verb is unambiguously nominative and the 
one following the verb possibly accusative, the train- 
ing tuple (Tennisspieler, trainieren, Jahr, 1) ('ten- 
nis player, train, year') is produced for this sentence, 
although the second NC is not an object of the verb. 

(8) Der Tennisspieler trainicrt das ganze Jahr. 
the tennis player trains the whole year 

4.1.2 Homographs 

In sentence (9) below, the word morgen ('to- 
morrow') is an adverb. However, its capitalized 
form may also be a noun, leading in this case to 
the erroneous training tuple (Morgen, trainieren, 
Tennisspieler, 0) (since [nc der Tennisspieler] is un- 
ambiguously nominative) . 

(9) Morgen trainiert der Tennisspieler. 
tomorrow trains the tennis player 
'The tennis player will train tomorrow.' 

4.1.3 Separable Prefixes 

In German, verb prefixes can be separated from 
the verb. When a finite (separable prefix) main verb 
occupies the second position in the clause, its prefix 
takes the last position in the clause core. For exam- 
ple in sentence (10) below, the prefix zuriick of the 
verb zuriickweisen ('to reject') follows the object of 
the verb and a subordinate clause with a subjunc- 
tive main verb. This construct is not covered by the 
current version of the grammar. However, due to 
the grammar definition, and since weisen is also a 
verb (without a separable prefix) in German, [c Er 
weist die Kritik der Prinzessin] is still accepted as a 
valid clause, leading to the erroneous training tuple 
(er, weisen, Kritik, 1) ('he, point, criticism'). Such 
errors may be avoided with further development of 
the grammar. 

(10) Er weist die Kritik der Prinzessin, seine 
he rejects the criticism the princess his 
Ohren seien zu groB, zuriick. 

ears are too big PRT 

'He rejects the princess' criticism that his ears 
are too big." 

4.1.4 Constituent Heads 

The system is not always able to determine con- 
stituent heads correctly. For instance in sentence 



(11), all words in the name Mexikanische Verband 
fiir Menschenrechte are capitalized. Upon encoun- 
tering the adjective Mexikanische, the system takes 
it to be a noun (nouns are capitalized in German), 
followed by the noun Verband "in apposition" . Sen- 
tence (11) is the source of the erroneous training tu- 
ple (Mexikanisch, beschuldigen, Behdrde, 1) ('Mexi- 
can, blame, public authorities'). 

(11) Der Mexikanische Verband fiir Mcnschcn- 
the Mexican Association for Human 
rechte beschuldigt die Behorden. 
Rights blames the public authorities 

'The Mexican Association for Human Rights 
blames the public authorities.' 

4.1.5 Multi-word lexical units 

The learning procedure has no access to multi- 
word lexical units. For instance in sentence (12), the 
first word in the expression Hand in Hand is consid- 
ered the object of the verb, leading to the training 
tuple (Architekten, arbeiten, Hand, 1) ('architect, 
work, hand'). Given the information the system has 
access to, such errors cannot be avoided. 

(12) Alle Architekten sollen Hand in Hand arbeiten. 
all architects should hand in hand work 

'All architects should work hand in hand.' 

4.1.6 Source Text 

Not only spelling errors in the source text are the 
source of incorrect tuples. For instance in sentence 

(13) , the verb suchen ('to seek') is erroneously in the 
third person plural. Since Reihe ('series') in German 
is a singular noun, and Kontakte ('contacts') plu- 
ral, the actual object, but not the subject, agrees in 
number with the verb, so the incorrect tuple (Reihe, 
suchen, Kontakt, 0) ('series, seek, contact') is ob- 
tained from this sentence. 

(13) *Eine Reihe von Staaten suchen gcschaftlichc 
a series from states seek business 
Kontakte zu der Region, 
contacts to the region 

'*A series of states seek contacts to the region.' 

Finally, a large number of errors, specially in test 
tuples, stems from the fact that soft constraints are 
used for words unknown to the morphology. 

4.2 Decision Algorithm 

In order to evaluate the accuracy of the decision al- 
gorithm, 1000 triples were selected from the set of 
test triples. Of these, 285 contained errors, based 
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Figure 1: The accuracy of the system at each level 



on the judgements of a single judge given the origi- 
nal sentence^]. The results produced by the system 
for the remaining 715 tuples were compared to the 
judgements of a single judge given the original text. 
The system performed with an overall accuracy of 
90.49%. 

A lower bound for the accuracy of the decision al- 
gorithm can be defined by considering the first noun 
in every test tuple to be the subject of the verb (by 
far the most common construct), yielding for these 
715 tuples an accuracy of 87.83%. 

The above figure shows how many of the 715 eval- 
uated test tuples were assigned subject /object based 
on the values P n , and the accuracy of the system at 
each level. 

The accuracy for P2 and P3 exceeds 95%. How- 
ever, their coverage is relatively low (28.81%). Since 
the procedure used to collect training data runs 
without supervision, increasing the size of the train- 
ing set depends only on the availability of sample 
text and should be further pursued. 

One reason for the relatively low coverage is 
the fact that German compound nouns consider- 
ably increase the size of the sample space. For in- 
stance, the head of the nominal constituent [nc Der 
Tcnnisspieler] ('the tennis player') is considered by 
the system to be the compound noun Tennisspieler 
('tennis player'), instead of its head noun Spieler 
('player'). Consistently considering the head of pu- 
tative compound nouns to be the head of nomi- 
nal constituents may in some cases lead to awk- 
ward results. However, reducing the size of the sam- 
ple space by morphological processing of compound 
nouns should be considered in order to increase cov- 
erage. 

4.2.1 Examples 

Following are examples of test tuples for which a 
decision was made based on values of Pi. All sen- 
tences below stem from the corpus. 

Sentence (14) was the source for the test tuple 
(Ausstellung, zeigen, Spektrum) ('exhibition, show, 

2 The higher error rate for test tuples is due to the soft 
constraints used for words unknown to the morphology. 



spectrum'). This tuple was correctly disambiguated 
with P2 = 0.87, with, among others, the training 
tuples (Ausstellung, zeigen, Bild, 1) ('exhibition, 
show, painting'), (Ausstellung, zeigen, Beispiel, 1) 
('exhibition, show, example'), and (Ausstellung, 
zeigen, Querschnitt, 1 ) ('exhibition, show, cross- 
section') obtained with the Agreement (sentences 
(15) and (16)) and Case Rules (sentence (17)), re- 
spectively. 

(14) Die Ausstellung zeigt das Spektrum jiidischer 
the exhibition shows the spectrum jewish 
Buchkunst von den Anfangen [. . .] 

book art from the beginnings 
'The exhibition shows the spectrum of jewish 
book art from the beginnings [...].' 

(15) die letzte Ausstellung vor der Sommerpause 
the last exhibition before the summer pause 
zeigt Bilder und Zeichnungen von Petra 
shows paintings und drawings from Petra 

Trenkel zum Thema "Dorf " . 

Trenkel to the subject village 

'The last exhibition before the summer pause 

shows paintings and drawings by Petra 

Trenkel on the subject "village".' 

(16) Die Ausstellung im Museum fur Kunst- 
the exhibition in the museum for arts and 
handwerk zeigt Beispiele seiner vielfaltigen 
crafts shows examples his manifold 

Objekt-Typen [. . .] 
object types 

'The exhibition in the museum for arts and 
crafts shows examples of his manifold 
object types [...]' 

(17) Einc vom franzosischen Kulturinstitut 
a from the French culture institute 
mit Unterstiitzung des Borsenvereins 
with support the Borsenverein 

in der Zentralen Kinder- und Jugendbibliothek 
in the central children and youth library 



im Biirgerhaus Bornheim 

in the community center Bornheim 

eingerichtete Ausstellung zeigt 
organized exhibition shows 

einen interessanten Querschnitt. 
an interesting cross-section 

'A exhibition in the central children's and 
youth library in the community center Born- 
heim, organized by the French culture 
institute with support of the Borsenverein, 
shows an interesting cross-section.' 

Sentence (18) below was the source for the test tuple 
(Altersgrenze, nennen, Gesetz) ('age limit, mention, 
law'). The system incorrectly considered the noun 
Altersgrenze to be the subject of the verb. 

(18) Eine Altersgrenze nennt das Gesetz nicht. 
an age limit mentions the law not 

'The law does not mention an age limit.' 

There were no training tuples in which the com- 
pound noun Altersgrenze occurred as the sub- 
ject/object of the verb. However, the noun Gesetz 
occurred more frequently as the object of the verb 
nennen than as its subject, leading to the erroneous 
decision. 

5 Conclusion 

This paper describes a procedure to automatically 
assign grammatical subject/object relations to am- 
biguous German constructs. It is based on an unsu- 
pervised learning procedure to collect test and train- 
ing data and the back-off model to make assignment 
decisions. The system was implemented and tested 
on a 15-million word newspaper corpus. 

The overall accuracy of the decision algorithm was 
almost 3% higher than the baseline of 87.83% es- 
tablished. The accuracy of the procedure for tu- 
ples for which a decision was made based on training 
pairs/triples (P2 and P3) exceeded 95%. 

In order to increase the coverage for these cases as 
well as the overall performance of the procedure, the 
sample space should be reduced by morphologically 
processing German compound nouns, and the size of 
the training set should be increased. Further, in the 
experiment described in this paper, the model was 
trained with data obtained by an unsupervised pro- 
cedure which performs with an accuracy of approxi- 
mately 87% for training data. Further development 
of the morphology component and grammar defini- 
tion should lead to improved results. 
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